Goto

Collaborating Authors

 top left corner


How to go back in time with Google Maps

Popular Science

You can access historical imagery through Street View. More information Adding us as a Preferred Source in Google by using this link indicates that you would like to see more of our content in Google News results. See what a street used to look like. Breakthroughs, discoveries, and DIY tips sent six days a week. By signing up, you confirm you are 16+, will receive newsletters and promotional content and agree to our Terms of Use and acknowledge the data practices in our Privacy Policy .


How to Control Everything on Your Phone With Your Voice (iOS and Android)

WIRED

Go fully hands-free with these tips for Android and iOS. With the arrival of digital assistant apps like Gemini and Siri, most of us have grown used to talking to our phones. But conversing with your Android or iOS device can go way beyond interacting with AI. You can also use your voice to launch apps, fill out text fields, and do just about everything that was previously only possible with your fingers and thumbs. Of course, the traditional touchscreen input will often be the way to go.


Embodied Red Teaming for Auditing Robotic Foundation Models

arXiv.org Artificial Intelligence

Language-conditioned robot models (i.e., robotic foundation models) enable robots to perform a wide range of tasks based on natural language instructions. Despite strong performance on existing benchmarks, evaluating the safety and effectiveness of these models is challenging due to the complexity of testing all possible language variations. Current benchmarks have two key limitations: they rely on a limited set of human-generated instructions, missing many challenging cases, and they focus only on task performance without assessing safety, such as avoiding damage. To address these gaps, we introduce Embodied Red Teaming (ERT), a new evaluation method that generates diverse and challenging instructions to test these models. ERT uses automated red teaming techniques with Vision Language Models (VLMs) to create contextually grounded, difficult instructions. Experimental results show that state-of-the-art models frequently fail or behave unsafely on ERT tests, underscoring the shortcomings of current benchmarks in evaluating real-world performance and safety. Code and videos are available at: https://sites.google.com/view/embodiedredteam.


LLaRA: Supercharging Robot Learning Data for Vision-Language Policy

arXiv.org Artificial Intelligence

Large Language Models (LLMs) equipped with extensive world knowledge and strong reasoning skills can tackle diverse tasks across domains, often by posing them as conversation-style instruction-response pairs. In this paper, we propose LLaRA: Large Language and Robotics Assistant, a framework which formulates robot action policy as conversations, and provides improved responses when trained with auxiliary data that complements policy learning. LLMs with visual inputs, i.e., Vision Language Models (VLMs), have the capacity to process state information as visual-textual prompts and generate optimal policy decisions in text. To train such action policy VLMs, we first introduce an automated pipeline to generate diverse high-quality robotics instruction data from existing behavior cloning data. A VLM finetuned with the resulting collection of datasets based on a conversation-style formulation tailored for robotics tasks, can generate meaningful robot action policy decisions. Our experiments across multiple simulated and real-world environments demonstrate the state-of-the-art performance of the proposed LLaRA framework. The code, datasets, and pretrained models are available at https://github.com/LostXine/LLaRA.


Inner Monologue: Embodied Reasoning through Planning with Language Models

arXiv.org Artificial Intelligence

Recent works have shown how the reasoning capabilities of Large Language Models (LLMs) can be applied to domains beyond natural language processing, such as planning and interaction for robots. These embodied problems require an agent to understand many semantic aspects of the world: the repertoire of skills available, how these skills influence the world, and how changes to the world map back to the language. LLMs planning in embodied environments need to consider not just what skills to do, but also how and when to do them - answers that change over time in response to the agent's own choices. In this work, we investigate to what extent LLMs used in such embodied contexts can reason over sources of feedback provided through natural language, without any additional training. We propose that by leveraging environment feedback, LLMs are able to form an inner monologue that allows them to more richly process and plan in robotic control scenarios. We investigate a variety of sources of feedback, such as success detection, scene description, and human interaction. We find that closed-loop language feedback significantly improves high-level instruction completion on three domains, including simulated and real table top rearrangement tasks and long-horizon mobile manipulation tasks in a kitchen environment in the real world.


CornerNet : Detecting Objects as Paired Keypoints

#artificialintelligence

CornerNet is a different object detection technique where we detects the objects bounding box by a paired key-points, the top-left corner and the bottom-right corner using a single convolution neural network. By detecting the key points, it eliminates the need of different anchor boxes commonly used in single stage detectors. In this paper by Hei Law and Jia Deng from Princeton University, they have introduced a new approach to object detection which outperforms all the single stage detectors. CornetNet introduces a new type of pooling layer called Corner Pooling, that helps localizing the corners. The Net achieves 42.2% AP on MS COCO dataset.


Amazon's Alexa WILL listen to everything you say

Daily Mail - Science & tech

Alexa's poor reputation for privacy may soon worsen as a patent filed by the firm suggests the virtual assistant may start listening before its'wake word' is said. Under the plans Alexa will be able to detect when it is being given a command even if the wake word is said at the end of the sentence instead of at the front. The move raises concerns over user privacy as Alexa will, by default, always be listening to conversations on the off-chance its wakeword is spoken. Alexa's poor reputation for privacy may soon worsen as a patent filed by the firm suggests the virtual assistant may start listening before its'wake word' is said. The patent, filed with the US Patent and Trademark Office, reveals the Seattle-fimrs plans for the next evolutionary step for it Alexa's technology.


CNNs, Part 1: An Introduction to Convolutional Neural Networks - victorzhou.com

#artificialintelligence

There's still much more that we haven't covered yet, such as how to actually train a CNN. Part 2 of this CNN series will do a deep-dive on training a CNN, including deriving gradients and implementing backprop. Subscribe to my newsletter if you want to get an email when Part 2 comes out (soon)! If you're eager to see a trained CNN in action: this example Keras CNN trained on MNIST achieves 99.25% accuracy.


A Beginner's Guide To Understanding Convolutional Neural Networks Part 1

@machinelearnbot

When you first heard of the term convolutional neural networks, you may have thought of something related to neuroscience or biology, and you would be right.


Here's Waldo: Computing the optimal search strategy for finding Waldo

#artificialintelligence

As I found myself unexpectedly snowed in this weekend, I decided to take on a weekend project for fun. While searching for something to catch my fancy, I ran across an old Slate article claiming that they found a foolproof strategy for finding Waldo in the classic "Where's Waldo?" book series. Now, I'm no Waldo-spotting expert, but even I could tell that the strategy they proposed there is far from perfect. That's when I decided what my weekend project would be: I was going to pull out every machine learning trick in my tool box to compute the optimal search strategy for finding Waldo. I was going to crush Slate's supposed foolproof strategy and carve a trail of defeated Waldo-searchers in my wake.